MIT 9.520/6.860 Project: Feature selection for SVM

نویسنده

  • Antoine Dedieu
چکیده

We consider sparse learning binary classification problems solved with linear support vector machines. We present two popular methods for variable selection: SVM-Recursive Feature algorithm and 1-norm SVM, and propose a third hybrid 1-norm RFE. Finally, we implement this three algorithms and compare their performances on synthetic and microarray datasets. 1-norm SVM gives the lowest test accuracy on synthetic datasets but selects more features. SVM-RFE is the most performant approach on real datasets. 1-norm RFE proves to obtain a good classification accuracy while selecting the smallest support on all kinds of datasets. 1 A brief review of SVM 1.1 Initial motivation of SVM We consider a set of n pairs of training data {(xi, yi)}ni=1, (xi, yi) ∈ R × {−1, 1}. Our goal is to learn a linear function to obtain a classification rule of our data: class(x) = sign(f(x) = wx+ β). When the data points is linearly separable (the two classes can be separated by an hyperplane), we aim at finding the optimal separating hyperplan that maximizes the margin M between the two classes. Consequently, we consider the following maximization problem: max w∈Rp,β∈R M subject to ∀i, yi(xi w + β) ≥M Using the scability of the solutions, we set ‖w ‖2 = 1 M and we obtain the equivalent problem: min w∈Rp,β∈R 1 2 ‖w ‖2 subject to ∀i, yi(xi w + β) ≥ 1 1.2 Primal and dual formulations When the data is not linearly separable, we still want to maximize M by allowing some points to be in the wrong size of the margin. Hence, we define the support vector machine primal problem: min w∈Rp,β∈R 1 2 ‖w‖2 + C n ∑ i=1 max ( 0, 1− yi(xi w + β) ) (1) C is a penalization parameter which controls the trade-off between the classification error and the norm of the estimator. We also consider the dual of the SVM problem: max α∈Rn ∑n i=1 αi − 1 2 ∑ 1≤i,j≤n αiαjyiyjx T i xj subject to { ∀i, 0 ≤ αi ≤ C ∑n i=1 αi = 0 (2) The representer theorem guarantees that the solution can be written as a linear combination of the traing data : w∗ = ∑n i=1 αiyixi We call support vectors the vectors such that αi 6= 0. MIT 9.520/6.860: Statistical Learning Theory and Applications (Fall 2016). 1.3 Motivation for feature selection In the past twenty years, the avaibility of high scale datasets with hundreds of thousands of variables -such as gene expression microarray or text categorization datasetshas stimulated the interest for research in feature selection and variable ranking algorithms. Guyon and al. (2003) point out in (2) the three advantages of such methods: they improve the speed and performances of the predictor, as well as the understanding of the underlying data. We now assume our data is in high-dimention: p n. The stucture of this paper is as follows. In Section 2 and 3, we present two common approaches for variable ranking using SVM. In Section 4, we propose our implementation of the two methods, as well as a third hybrid one. In Section 5, we compare the classification performances of the three algortihms on synthetic and real datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Project for 9.520: Feature Selection Analysis

Several underlying behaviors of feature selection techniques are analyzed in this paper. A bound relating sample size and dimensionality is derived and verified empirically. The 'scurve' relationship between test error and amount of training data is shown not to be a generalized behavior of all feature selection techniques.

متن کامل

Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Mental Arithmetic Task Recognition Using Effective Connectivity and Hierarchical Feature Selection From EEG Signals

Introduction: Mental arithmetic analysis based on Electroencephalogram (EEG) signal for monitoring the state of the user’s brain functioning can be helpful for understanding some psychological disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, or dyscalculia where the difficulty in learning or understanding the arithmetic exists. Most mental arithmetic recogni...

متن کامل

Pedestrian Detection by Boosting Soft-Margin SVM with Local Feature Selection

We present an example-based algorithm for detecting objects in images by integrating component-based classifiers, which automaticaly select the best local-feature for each classifier and are combined according to AdaBoost algorithm. The system employs soft-margin SVM for base learner, which is trained for all localfeatures and the optimal feature is selected at each stage of boosting. The propo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016